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ABSTRACT 

y  For  estimating  the  variance  of  the  regression  estimator  in 
simple  random  sampling  without  replacement,  several  design-based 
and  model-based  estimators  and  a  new  class  of  estimators  are 
compared.  Their  second  order  expressions  and  biases  are  derived 
and  compared.  Empirical  results  on  the  biases  and  MSE 1  s^STthe 
variance  estimators  and  the  conditional  and  unconditional  coverage 
probabilities  of  their  associated  t-intervals  lend  support  to  the 
theoretical  results  and  suggest  further  questions.  ■>  r'L,f'T'  ','ci 
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SIGNIFICANCE  AND  EXPLANATION 

In  estimating  the  population  mean  of  a  character  y,  we 
often  make  use  of  an  auxiliary  covariate  x  about  which  informa¬ 
tion  is  more  readily  available  and  is  positively  correlated  with 
y.  One  commonly  used  estimator  in  survey  sampling  is  the  regression 
estimator.  To  assess  the  variability  of  the  estimator,  we  need  an 
estimator  for  its  variance.  Several  variance  estimators  have  been 
proposed  using  model-based  or  design-based  arguments.  We  propose 
a  class  of  variance  estimators,  which  includes  or  approximates 
several  existing  variance  estimators  in  the  literature.  The 
asymptotic  variance  and  bias  of  these  estimators  are  found  and 
compared  with  results  from  an  empirical  study.  Empirical  results 
on  coverage  probabilities  of  Student's  t-intervals  with  these 
variance  estimators  are  also  obtained  and  proper  interpretation 
is  given. 
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1.  Introduction 

The  main  purpose  of  this  paper  is  to  provide  a  theoretical  and 
empirical  comparison  of  several  variance  estimators  for  the  regression 
estimator  in  simple  random  sampling  without  replacement.  The  companion 
problem  for  the  ratio  estimator  has  been  well  studied  in  the  litera¬ 
ture.  See  the  references  of  Wu  and  Deng(1983)  and  Rao<1985).  In  the 
past  more  attention  has  been  given  to  the  ratio  estimator  because  of 
its  computational  ease  and  general  applicability  for  general  sampling 
designs.  The  ratio  estimator  is  appropriate  for  populations  whose 
regression  line  passes  close  to  the  origin.  If  the  intercept  of  the 
regression  line  is  significantly  nonzero,  it  is  much  less  efficient 

_2 

than  the  regression  estimator (  Deng,  1984).  In  general,  apart  from  n 
terms,  the  mean  squared  error  of  the  former  is  bigger  than  that  of  the 
latter (Cochran,  1977,  p.196).  For  estimating  cell  totals  in  tables  of 
the  type  typically  constructed  from  survey  data,  Fuller(1977)  showed 
the  superior  performance  of  the  regression  estimator.  For  stratified 
samples  Wu(1985)  showed  that  the  model  underlying  the  use  of  the  com¬ 
bined  ratio  estimator  has  an  artificial  constraint  while  the  model  for 
the  combined  regression  estimator  is  more  natural.  Given  the  present 
availability  of  fast  and  inexpensive  computing,  the  computational 
advantage  of  the  ratio  estimator  should  be  less  of  a  concern  and  the 
regression  estimator  will  gain  wider  popularity. 
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There  are  two  approaches  to  the  variance  estimation  problem.  The 
traditional  one,  based  on  the  probability  distribution  generated  by 

the  sampling  design,  is  well  summarized  in  Cochran's  book.  By  imposing 
a  superpopulation  model  on  the  actual  finite  population,  inference 

about  the  characteristics  of  the  finite  population  can  be  made  via  the 
structure  of  the  model(  Brewer,  1963;  Scott  and  Smith,  1969;  Royall, 
1970).  Several  model-based  variance  estimators  were  proposed  and  stu¬ 
died  in  P.oyall  and  Eberhardt ( 1975 ) ,  Royall  and  Cumberland(  1978) .  For 
the  regression  estimator,  an  empirical  study  of  these  model-based 
variance  estimators  and  a  traditional  estimator  vlr  (2.7)  was  given 
in  Royall  and  Cumberland( 1981 ) .  Several  traditional  estimators  were 
compared  in  earlier  studies  by  Rao(1968,  1969).  The  estimators  in  the 
above  three  papers  and  some  new  ones(  formula  (2.9))  will  be  studied 
in  our  paper.  Our  theoretical  comparison  of  these  design-based  and 
model-based  variance  estimators  is  design-based,  although  some  results 
are  given  a  model-based  interpretation.  More  precise  results  are  made 
possible  by  the  second  order  expansions  of  these  estimators  reported 
in  Section  3.  Our  simulation  study  contains  two  new  features,  the  mean 
squared  errors  (MSE)  of  the  variance  estimators  and  the  conditional 
coverage  probabilities  of  the  associated  t-intervals. 

The  organization  and  major  findings  of  this  paper  are  as  follows. 
Section  2  lists  all  the  variance  estimators  under  comparison,  includ¬ 
ing  a  class  of  adjustments,  (2.9),  of  the  standard  variance  estimator 
v^  (2.7).  The  optimal  adjustment  within  the  class  (2.9)  is  studied 
in  Section  3.2  in  parallell  to  Wu( 1982a).  From  their  respective 
asymptotic  expansions,  the  jackknife  estimator  Vj  (2.20)  and  two 
bias-robust  estimators  v^  (2.13)  and  v^  (2.14)  have  the  same  leading 
term  of  order  n’*.  For  the  next  order  terms,  Vj  is  bigger  than  v^, 
which  in  turn  is  bigger  than  v„.  The  same  expansions  also  enable  us 
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to  compute  the  biases  of  these  estimators  for  estimating  the  MSE  of 

A 

the  regression  estimator  (2.1).  To  achieve  this  goal,  a  new 

A 
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expansion  (  to  order  n  )  for  MSE<  ylr>  is  derived  in  Theorem  4.1. 
Among  all  the  estimators,  vD  is  the  only  one  that  captures  the  n 

A 

-2  -  -2.5 

and  n  terms  of  MSE(  y^r>.  Its  absolute  bias  is  of  order  n  ’  and  is 

A 

the  smallest.  The  jackknife  estimator  v_  overestimates  MSE(  y,  ) 

J  ir 

A 

while  v„  underestimates  MSE(  y.  ).  A  condition  (4.7)  (which  is  often 
H  lr 

satisfied  by  natural  populations)  is  found,  under  which  the  commonly 

A 

used  estimator  v^  and  another  one  v^  underestimate  MSE(  ylr).  The 
findings  on  bias  are  well  supported  by  Royal 1 -Cumberland ' s ( 1981 )  study 
(summarized  in  Table  1)  and  our  study  in  Section  5(Table  2).  The 
empirical  MSE  behavior  (Table  2)  of  different  variance  estimators  sup¬ 
port  the  theoretical  result  Theorem  3.1.  Those  vg  with  g  chosen  to  be 
gQpt  (2.10)  have  smaller  MSE's.  An  interesting  and  somewhat  surpris¬ 
ing  finding  is  that  the  jackknife  variance  estimator  Vj  consistently 
has  the  largest  MSE.  Typically  the  two  model-based  estimators  vQ  and 

A 

Vjj  have  bigger  MSE's.  If  the  MSE  of  y^r  is  the  primary  parameter  of 
interest  as  in  determining  the  sample  size  for  future  surveys,  the 
optimal  estimator  v  should  be  used  in  place  of  Vj,  v^  or  v^.  For 

g 

coverage  probabilities  of  t-intervals  of  the  form  (5.2),  which  are 
relevant  to  internal  inference  about  the  population  mean,  we  observe  a 
reverse  pattern.  In  terms  of  the  closeness  of  the  empirical  uncondi¬ 
tional  coverage  probabilities  to  the  nominal  level  (Table  3),  we  have 
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Vj  >  vD  >  vH  >  v2  >  >  vlr  in  decreasing  order  of  performance. 

In  terms  of  the  stability  and  closeness  (to  the  nominal  level)  of  the 
coverage  probabilities  conditional  on  the  sample  mean  of  the  eovari- 
ate,  a  similar  pattern  is  observed.  This  is  interesting  since  the 

A 

losers  Vj,  v^  and  vH  for  estimating  MSEC  ylr)  turn  out  to  be  the 
big  winners  here.  Perhaps  the  most  important  recommendation  for  prac¬ 
titioners  is  that  the  commonly  used  estimator  v^r  fails  on  both 
grounds  and  should  only  be  used  with  caution.  An  obvious  conclusion  is 
that  different  variance  estimators  should  be  used  for  different  pur¬ 
poses.  Further  theoretical  study  is  needed  to  understand  this  empiri¬ 
cal  phenomenon  (  the  same  phenomenon  was  observed  in  Hu  and  Deng's 
empirical  study  for  the  ratio  estimator.) 

The  restriction  to  simple  random  sampling  without  replacement 
will  undoubtedly  rule  out  many  large  scale  complex  surveys.  He  hope 
our  study  will  inspire  further  interest  and  eventually  lead  to  useful 
recommendations  for  more  complex  situations.  In  settings  like  market¬ 
ing  research,  simulation  analysis  (Iglehart,  1978)  and  telephone  sur¬ 
veys  where  simple  random  sampling  is  a  key  element  of  the  sampling 
plan,  our  results  may  be  directly  applicable. 


2.  Variance  Estimation  For  Regression  Estimator 

Consider  a  population  consisting  of  N  distinct  units  with  values 
(  Xj,  yi>  ,  i=*l(l)N  ,  with  xi  positive  and  known.  Samples  are  drawn 
from  the  population  at  random  without  replacement.  Denote  the  sample 
and  population  means  of  yi  and  xi  by  y,  x  and  Y,  X  respectively. 


Two  estimators  of  Y  commonly  used  in  practice  are  the  ratio  estimator 


y 


and  the  regression  estimator 


where 


ylr  =  y  +  b(  X"  x)  ' 


b  =  E  (  y y)(  Xj-  x)/  E  (  x^-  x)‘ 
i=l  i*l 


(2.1) 


(2.2) 


is  the  sample  regression  coefficient  of  yi  on  Xj.  The  regression 
estimator  is  the  best  linear  unbiased  predictor  of  Y  under  the  fol¬ 
lowing  superpopulation  model  (Royall,  1970) 


y<  "  3ft  +  Pi  +  ch 


(2.3) 


where  are  uncorrelated  with  mean  zero  and  variance  o  .  The  super¬ 
population  model  underlying  the  use  of  the  ratio  estimator  is  the  one 
without  the  intercept  term 

The  leading  term  of  the  mean  squared  error  (MSE)  or  variance  of 


ylr  is 


v  -  (i-f,  i  ;  _  3 

v  '  1  n  'h-1  i 


(2.4) 


where 


ei  =  (  yi"  Y)_B(  xi"  X) 

is  the  residual  of  y^  to  the  regression  line  Y  +  B(  Xj-  X) , 


(2.5) 


B  ■  E  (  x.-  X)(  y.-  Y>/  E  (  x.-  X)‘ 
i=l  1  1  i*l  1 


(2.6) 


is  the  population  regression  coefficient  of  yi  on  Xj.  ,  and  f 
is  the  sampling  fraction. 
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The  most  commonly  used  estimator  of  the  approximate  variance  V  is 
its  sample  analogue 


1  -f  1  “  A 

vlr  *  ei  ' 


(2.7) 


where 


e^^  =  (  y^-  y)-b(  x^-  x) 
is  the  i-th  residual  based  on  the  sample  and  b  is  given  in  (2.2). 


(2.8) 


Fir  estimating  the  variance  of  the  ratio  estimator,  Wu(1982a) 

_  g 

V 

considered  v  =  (  _  )  Vg  as  a  class  of  adjustments  of  the  usual 

y  x 

estimator  (Cochran,  1977,  p.155) 


l-f  1  n  v  7 

v0  =  <  n  ^  V  • 

1  =  1  x 


(2.8.1) 


He  then  proposed  to  choose  g  by  minimizing  the  mean  squared  error  of 
v  .  In  an  empirical  study  by  Wu  and  Deng  (1983),  the  optimal  v  per- 

g  g 

f orm3  well  among  several  other  variance  estimators.  In  the  regression 
case  we  will  consider  a  similar  class  of  variance  estimators 


■  ‘  i  ’’  hr  • 


(2.9) 


Let  Sz„  denote  the  population  covariance  of  x^  and  z^,  the 

population  variance  of  xi-  It  will  be  shown  in  Theorem  2.1  that  the 

leading  terms  of  MSE(  v^)  is  minimized  by 


'opt 


Szx/  X  2 

2  —2 

v  x 


(2.10) 


which  is  the  population  regression  coefficient  of  zJ  Z  over  x^/  X, 

2 

i=  1(1  IN  and  z^  =  e,^  is  the  residual  squared.  This  suggests  the 

following  optimal  estimator  within  the  class  (2.9), 
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(2.11) 


v  =  (  )g 
2  x 


lr 


where  g  is  the  sample  analog  of  g 

For  variance  estimation  of  the  ratio  estimator,  Fuller(1981>  sug¬ 
gested  a  regression  adjustment  to  vQ  (2.8.1).  A  similar  adjustment 
can  be  applied  to  v^r*  F°r  the  ratio  estimator,  as  pointed  out  in  Wu 
and  Deng  (1983),  Fuller's  estimator  is  asymptotically  equivalent  to 

A 

—  —  ci  A 

(  X/  x>^  vQ,  where  g  is  the  sample  analogue  of  the  optimal  9opt-  The 
corresponding  result  is  al30  true  for  the  regression  estimator. 

Another  variance  estimator  closely  related  to  v^f  is 


vt.  *  vlrC1  * 


(  x  -  X)2 


1-f  n  —  2 

<“)  z  (  Xj  —  xr 

n  i-i  1 


-3  , 


(2.12) 


whose  justification  comes  from  standard  regression  theory  (Cochran, 
1977  ,  p. 199 ) . 


Royall  and  Cumberland  (1978)  proposed  two  bias-robust  (against 
misspecif ication  in  model  (2.3))  variance  estimators 

2 


„  -  < 1-f)2  r  „  2 

VD  '  nfn-D^^i  ei 


(2.13) 


where 


11  * 

vu  =  {*-*->*  2  p,  e.  +  f (~)r^  Z  e. 
H  n  j.j  i  i  n  n-2  i=1  i 


(2.14) 


= 


r±  +  f/(l-f) 


l-(  x  -  x)2/( (n-l)g(s) ) 


(2.15) 


1  n  —  2 
g(s)  *  *  Z  (  x.-  x) 
n  .  ,  l 


N  X-n  x 
N-n 


(2.16) 
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u 


tv 


y  ■■ 


and 


r^  1  1  +  (  x^  x) (  xr~  x)/g(s) 


—  ,  n  7  n  j 

0.  =  r . z/ ( 1--  Z  w  k.)  ,  w  =  r,  /  Z  r/  , 
*  i  n  * 


~  2  J 


(2.17) 


(2.18) 


k^  =  Cl  +  (  x^-  x)  /g(s)3/n  . 

The  last  estimator  under  comparison  is  the  jackknife  variance 


(2.19) 


estimator 


I  -  f  11  A  A 

vT  =  <~~)  (n-1)  Z  (0,,.-0,  ,>  , 

J  n  i.  —  1  » i '  (•) 


(2.20) 


where  0(i)  is  the  regression  estimate  (2.1)  based  on  the  sa  a  of 

A 

size  n-1  with  unit  i  deleted  from  the  sample  and  0  is  the  i 

\  •  / 


je 


of  0 


<  i ) ' 


3.  Relationships  among  the  Variance  Estimators  under  Comparison 
3.1.  Asymptotic  Expansions 

To  study  the  asymptotic  relationships  among  the  variance  estima¬ 
tors  in  Section  2,  we  need  the  following  asymptotic  expansions 


n  —  —  n  —  2  -0  5 

6  *  b-B  =  Z  (  x.-  x ) (  e<-  e) /  Z  (  x.-  x>  =  0  (n  >  (3.1) 

n  i=l  1  1  i*l  1  p 


=  C  u-  e(  x-  X)3/(n_1(n-l)  s^>  =  u/  +  Opfn"1)  (3.2) 


u  -  e(  x-  X) 


u(v-  V)  .  n  .  -1.5. 
s4  +  °p(n  >  ' 


(3.3) 


where 


vi  2 


ui  =  ei(  xi”  X>  '  vi  =  (  xi"  X> 
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(3.4) 


and  u,v  are  the  sample  means  of  and  v^ ,  V  the  population  mean  of 
v^.  Since  the  population  means  of  e^  and  are  zero. 


e  =  0  (n~°* 5)  ,  u  =  0  (n'0'5)  . 

P  P 

2 

Ignoring  the  lower  order  terms  of  $n  ,  we  have 


(3.5) 


—2 

S2  =  0  (n-1 )  =  ~~r  +  ( n_1  ‘  5 ) 


u2-  2  u  e(  x-  X)  „  u2(v-  V)  ,  n  .  -2, 

_  2  —  +  0p(n  )  . 

X  X 

In  writing  (3.3)  and  (3.7),  we  used  s2  =  S2  +  0p(n  °’^) 
3.2.  Optimal  Variance  Estimators  amoncr  v 


(3.7) 


Using  the  minimum  mean  squared  error  of  the  variance  estimator  as 
the  criterion,  we  will  choose  an  optimal  estimator  within  the  class 
(2.9).  The  following  lemma  finds  the  leading  terms  of  and 
Var (  vg ) 

Lemma  3.1. 


(a)  v.  =  (^— -)  z  +  0  (n  2),  where  z  =  ^  E  z.  ,  z.  =  e.2. 

lr  n  p  n  l  i  i 


(b)  v  =  (-l^X  z  +  g<5  x)  z)  +  0  (n-2),  where  S  x  =  (  x~  ^ 

g  n  p  v 


1-f  32  Z  2Z2  -3  5 

(c)Var(  v  )  =  ( - )  (  S  -  2  g  ( — )  S_„  g  (  )  S  )  +  0(n  ’  )  . 

g  n  z’-zx^-x 

Except  for  the  obvious  ones,  the  derivations  and  proofs  in  this  paper 


are  given  in  the  Appendix. 


By  minimizing  expression  (c)  of  Lemma  3.1,  we  have 


Theorem  3.1..  The  optimal  choice  of  g,  minimizing  the  variance  of  v  , 
is  given  by  gop(.  defined  in  (2.10). 
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For  estimating  the  variance  of  the  ratio  estimator,  a  similar 
result  to  Theorem  3.1  was  obtained  in  Wu( 1982a)  with  a  major  differ¬ 


ence.  His  z^  takes  a  more  complex  form 


Z  x  .d . 

2  „  i  =  l  11  , 

di  ’  2  N -  di 


(3.8) 


where  d. 

l 


y.-(  Y/  X)  x.  is  the  residual  in  the  ratio  context.  Note 
l  l 


that  the  second  term  of  (3.8)  does  not  appear  in  the  regression  case. 
One  explanation  for  this  difference  is  that  the  regression  estimator 
incorporates  a  non-zero  intercept  term  while  the  ratio  estimator 
suppresses  it.  More  precisely,  each  y  value  can  be  decomposed  as 

y±  m  A  +  B  xi  *  e*  (3.9) 

where  B  and  e^  are  defined  in  (2.6)  and  (2.5),  A  =  Y-B  X  is  the 
intercept  from  fitting  a  regression  line  to  the  population 
(  yi,  xi),i  =  l(l)N.  With  thi3  representation,  a.^  =  -A(  Xj-  X)/  X  +•  ei 


(N-l)  S. 


Z  xA  d. 
i  =  l  1  1 


(3.10) 


from  which  it  is  easy  to  see  that  the  extra  term  in  (3.8)  would  be 
zero  if  the  intercept  were  zero. 


To  obtain  further  properties  of  v^,  let  us  assume  the  superpopu¬ 


lation  nodel 


y1  =  «  +  P  xi  +  ei  (3.11) 

Em  (  e±)  =  0  ;  Em  (  Ei  £j>  =  o2  x ^  for  i=j;  0  for  i  *  j, 
where  £M  denotes  expectation  with  respect  to  the  model.  Under  (3.11), 
B  =  8  +  0(N_1/2)  and  e±  =  Ej  +  Q(N-1/2>.  By  using  Wu's(1982a)  argu¬ 
ment,  we  find  that  up  to  order  N 
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Theorem  3.2.  Under  model  (3.11)  with 

(a)  t=0,  vQ(  »  vlr>  is  the  optimal  estimator  of  V  among  vg; 

(b)  t=l,  Vj  i3  the  optimal  estimator  of  V  among  vg; 

<c)  t  2  1,  then  g*  2  1  and  v^,  v2  are  both  better  than  vQ  for 
estimating  V. 


Recall  that  under  (3.11)  with  t=0  ,  is  the  best  linear 
unbiased  predictor  of  Y. 

3.3.  Relationships  among  vQ,  vH  and  Vj 

The  two  estimators  vH  and  vQ  are  approximately  unbiased  estima¬ 
tors  of  the  true  error  variance  even  when  the  error  variance  structure 
is  not  correctly  specified  by  the  model.  According  to  Theorem  3  of 
Royall  and  Cumberland ( 1978) ,  under  some  mild  conditions,  vH  ,  vD  and 
Vj  are  asymptotically  equivalent,  i.e.,  vH  =  Vj(l  +  o(l>)  and  so 
on.  By  studying  the  second  order  terms  of  the  variance  estimators,  we 
find  some  interesting  relationships  among  them.  We  will  show  that  Vj 
is  stochastically  larger  than  v^  and  v^  is  larger  than  vH<  Lemmas 
3.2  and  3.3  find  the  leading  terras  of  v^  and  v^.  Throughout  this 
subsection,  we  assume  f  =  0<n  °'^). 
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Lemma  3.2. 

1-f  "  -  2(1'P(  V  X)>2 

+  Op(n‘2-5) 

VD  ‘  rUn-l),.,  '"I  _  2 

1-A  l-q(  xt-  x)Z 

l  J  .  1  L  ) 

2 

'  ncJ-U^  el  C1-2p'  V  xl*,‘>2 

+  q)(  x.-  x)2D+  0  (n"2,5> 
i  p 

(3.13) 

where 

(  x-  X) 

1 

(3.14) 

p  g(s) 

and  g(s)  is  defined  in  (2.16). 

'  q  3  (n-l)g(s) 

Lemma  3.3. 

v  *  I  (l-p(  x.- 

H  nZ  i*l  1 

2 

x)>2  e,  +  0  (n‘2'5>  . 

1  P 

(3.15) 

From  (3.13)  and  (3.15),  we  have 

Lemma  1.4. 

n  2 

VH*  *D  ■  -  *i  ll-P< 

xi-  x) >2 

2 

n  a 

+  q  Z  e.  (  x,- 
i-1  x  1 

x) 2D  +  0p(n~2’5)  . 

(3.16) 

Lemma  3.4  implies  that  vQ  is  asymptotically  larger  than 

VH- 

Lemma  3.5  finds  the  leading  terms 

of  Vj. 

Lemma  3.5. 

2 

1-f  "  *i  (1-P<  xi"  i))2 

+  0p(n'2'5)  . 

/  *5  1  7  \ 

VJ  '  n(n-l)iA1  2 

(l-q(  x,-  x)2) 

1  J  •  1  /  ) 

He  can  compare 


v_  and  v_  based  on  Lemmas  3.2  and  3.5. 
D  J 


Lemma  2-6. 


vn  + 


1-f 


n  a 


-.2 


-  , .  Z  e,  (  x,-  x)  +  0_(n 

n(n-l)i=1  ii  p 


-2.5 


>  . 


(3.18) 


'J  "  D 

Lemma  3.6  Implies  that  Vj  is  asymptotically  larger  than  vQ. 
4.  Asymptotic  Bias  Behavior  of  Variance  Estimators 


4.1.  Second-order  Expansions  of  MS£(  ylr>  and  vlr 
Theorem  4.1.  Let  V  be  the  approximate  variance  (2.4) 


(a)MSE(  ylr )  *  V 


s2  4  S  9  +  2  S  2U3 

+  (lr£)2(2  s2  _  IzM.  — U  +  - JSL_fi - - — - )  +  o<n-2,5), 

n  e  1-f  s2  g4 

X  X 

-1  ^  —  3  2 

where  U,  *  (N-l)  E  (  x. -  X)  ,  S  is  the  population  variance  of 
J  i*l  1  u 

u. ,  (3.4),  and  S  S  ,  are  the  population  covariances  of  x.  and 
1  xe2  x^e  1 

2  2 

et  ,  x^  and  ej  respectively. 


772 


(b)  v  =  (^— ^)  s2  -  + 

’  lr  1  n  n-2  e  n-2  2  p 

8x 


0  (n-2,5) 


If  f  >  0(n"0'5)  ,  then 

A 

S2 

q2  _~u  . 

4 

S  -  2  +  2  ! 

(c)MSE(  ylf)  *  V+  (2 

x2e 

Se  “  2  + 

e  S2 

s4 

X 

X 

If  f  *  0(n-0*5)  is 

relaxed  to 

f 

*  o(l)  in  ' 

-i 

should  be  changed  to  o(n 

) .  The  same 

applies  to 

tion  4.2. 

2  3 

— - )+  0(n-2’5)  . 


-2.5, 
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4.2.  Bias  Behavior  of  vlf,  v L,  vg,  vH,  vD,  and  Vj 

-0  5 

Throughout  this  subsection  we  assume  f  *  0(n  “  ).  For  any 

A 

variance  estimator  v,  we  denote  its  bias  for  estimating  MSE(  >  by 

A 

B(v)  =  E(v)-MSE<  y  ) .  The  biases  of  the  six  variance  estimators  are 
given  below: 


2  S  2U3+4S2 

B(  vlr ’  =  "  ~2  (  Se  +  - - - 4 - +  0(n"2-5)  , 


(4.1) 


2  S  2U3+4  S  2 


B(  vr  )  =  -  -4  C- 


^-£-3  +  0(n“2-5)  , 


(4.2) 


2  S  2U3  +  4  S  ~  1 

B(  vct ’  =  ~2  C“  Se~  - - ” - 4 - 3L-S- 

n  e  Sx 


(4.3) 


S  2  2 

♦  o.n'2'5,  , 


B(  vn)  =  0(n  )  , 


(4.4) 


B(  vH)  = 


i  <  Se  +  T 2>  +  0(n'2'5)  , 


(4.5) 


S2 

-4  — J  +  0(n~2  ■ 5 ) 

si 


(4.6) 


Formula  (4.5)  follows  from  (3.16)  and  (4.4);  (4.6)  from  (3.18) 


and  (4.4).  The  others  are  proved  in  the  Appendix. 


From  (4.1)  and  (4.2),  it  is  easy  to  see  that  if 


U,  >  0  , 


xe 


2  3 


(4.7) 


then  is  less  downward  biased  than  v^r*  In  fact,  S  2  U3  >.  0  for 

all  six  populations  studied  in  Royall  and  Cumberland ( 1981) .  Therefore, 
as  expected  from  our  results,  both  vlr  and  v^  underestimate 

A 

MSE(  ylr)  for  these  populations.  See  Table  1. 


The  leading  terms  of  B(  v^)  is  a  quadratic  function  in  g  with 

positive  coefficient  for  the  quadratic  term.  One  can  easily  check  that 

the  minimum  of  B(  v  )  occurs  at  g  =  g  .-0.5  ,  where  g  .  is  defined 
g  opt  opt 

in  Theorem  2.1.  Furthermore,  if  S  _  U,  >.  0,  then  this  minimum 

xez  J 

corresponds  to  the  largest  negative  bias  of  vg.  This  observation 
agrees  with  the  empirical  study  of  the  next  section. 

We  next  observe  that  v^,  Vj  have  biases  of  the  order  n 

-2 

whereas  vn  has  a  smaller  order  bias.  Up  to  the  order  n  ,  v. 


-2 


H 


underestimates  MSE(  ylr>*  Vj  overestimates  MSE(  ylr)  and  vD  is 

-2  5 

unbiased  in  the  sense  that  its  leading  term  is  of  order  n  '  .  For 

A  A 

the  ratio  estimator  yR,  the  overestimation  of  Vj  for  MSE(  yR)  was 
proved  by  Wu( 1982b).  The  above  observations  are  supported  by  the 
simulation  study  in  Section  5  (Table  2)  and  an  empirical  study  on  six 
natural  populations  with  sample  size  32  in  Royall  and  Cumberland 
(1981,  p.926),  on  which  the  following  table  is  based. 
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A 


Table  1. 

Relative  Bias 

B(v)/MSE( 

yir> 

of  Five 

Estimators 

Population 

vlr 

VL 

VH 

VD 

VJ 

Cancer 

-.14 

-.12 

-.12 

-.06 

.09 

Cities 

-.06 

-.04 

-.04 

-.01 

.04 

Counties  60 

-.15 

-.14 

-.08 

-.02 

.16 

Counties  70 

-.14 

-.13 

-.16 

-.07 

.14 

Hospitals 

-.04 

-.03 

-.02 

.01 

.06 

Sales 

-.24 

-.21 

-.19 

-.12 

.11 

Empirical  Studs 


Populations  Under 


and  Simulation  Prccedur 


In  Sections  3  and  4,  the  asymptotic  behavior  of  the  variance 
estimators  were  studied.  One  may  ask  whether  these  results  are  appli¬ 
cable  to  moderate  sample  size.  The  variance  estimators  given  in  Sec¬ 
tion  2  will  be  compared  empirically  on  six  natural  populations.  For  a 

i 

^  detailed  description  of  these  populations,  see  Royall  and  Cumberland 

i  (1981).  The  procedure  described  below  was  conducted  on  the  UNIVAC  1100 

1  at  the  University  of  Wisconsin-Madison.  The  uniform  numbers  were  gen- 

i 

erated  according  to  subroutine  RANUN. 


I 

j  We  draw  1000  simple  random  samples  of  size  32  from  each  popula- 

j  tion  whose  size  ranges  from  125  to  393.  For  each  sample  chosen,  we 

I  a  _ 

I  compute  the  regression  estimate  y^f,  sample  mean  x  and  variance 

estimators  vQ,  v^,  v2,  v  ,  v^,  vH,  v^and  Vj.  For  each  simulated 
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( 


sample  and  each  variance  estimate  v,  we  also  compute  the  t-statistic 


and  the  (1  -  a)  confidence  interval  for  Y 

1  ?lr  -  W30’  v1'2  •  ?lr  +  W30’  v1'2  ’  ’  <5'2’ 

where  ta/2(30)  is  the  upper  a/2  percentile  of  the  t-distribution  with 
30  d.f. 

The  unconditional  behavior  of  the  estimators  can  be  studied  by 
taking  the  average  of  the  corresponding  quantity  among  all  1000  sam- 

A  A 

—  —  _  9 

pies.  For  example,  the  MSE<  ylr>  is  calculated  as  1000  *Z(  y^r  -  Y> 

over  the  1000  simulated  samples,  and  the  bias  of  a  given  variance 

A 

estimator  v  is  calculated  as  1000  *Zv  -  MSE(  ylr>  over  the  same  1000 
samples. 

To  study  their  conditional  behavior  on  x,  we  divide  the  1000 
samples  into  groups  according  to  the  following  procedure.  Rearrange 
the  1000  samples  in  increasing  order  of  x;  divide  the  1000  samples 
into  10  groups  so  that  the  first  group  has  100  samples  whose  x  values 
are  the  smallest,  the  next  group  contains  samples  with  the  next  100 
smallest  x  values,  and  so  on.  Within  each  group,  we  compute  the 
average  of  x,  v  ,  and  the  actual  percentage  coverage  of  each  associ¬ 
ated  confidence  interval. 

The  following  three  criteria  will  be  used  to  compare  the  prefor- 
mance  of  the  variance  estimators:  their  mean  squared  error  (MSE)  and 
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bias,  and  the  coverage  probability  of  the  associated  confidence 
interval.  The  simulation  results  are  summarized  in  Tables  2  and  3. 

5.2.  MSE  of  y 

The  pattern  is  similar  to  that  of  Wu  and  Deng(1983)  for  the  ratio 
estimator . 

(a)  v  has  smaller  and  often  the  smallest  MSE  among  all  the  estiir.a- 

A 

g 

tors  considered.  This  is  consistent  with  the  asymptotic  result  of  Sec¬ 
tion  3. 

(b)  Among  vQ,  and  #  the  best  performer  is  the  one  closest  to 
a 

opt 

(c)  The  jackknife  variance  estimator  Vj  has  the  largest  MSE  among  all 
variance  estimators  considered. 

(d)  Among  v^,  vD  and  v^.,  vH  has  the  smallest  MSE. 

(e)  vH  has  bigger  MSE  than  vQ,  v^,  v  and  v^. 

g 

5.3.  Bias  of  v 

(a)  All  estimators  under  consideration,  except  v  ,  are  consistently 

J 

downward  biased.  The  downward  bia3  of  vu  is  predicted  in  (4.5).  Since 

S  U.,  _>  0  for  all  six  populations,  the  downward  bias  of  vn  and  v, 
xe^  3  0  L 

is  predicted  in  (4.1)  and  (4.2). 

(b)  The  estimator  Vj  is  always  upward  biased  while  v^  does  not  show 
any  pattern.  This  is  again  well  predicted  in  (4.4)  and  (4.6). 

(c)  v^  has  the  smallest  absolute  bias  among  all  the  estimators.  The 
reason  is  that  v^  is  the  only  estimator  with  a  lower  order  bias. 

(d)  vr  has  a  smaller  bias  than  Vq,  ,  V£,  and  v  . 


Table  2.  Root  mean-square  error  and  bias*  of  v's 


2  3  4  5  6 


vo 

2.91 

52.6 

13.6 

22.0 

6.75 

24.9 

(-1.2) 

(-5.2) 

(-10.0) 

(-5.6) 

(-1.6) 

(-13.8) 

V1 

2.51 

51.3 

13.1 

19.0 

6.12 

20.9 

(-1.3) 

(-5.8) 

(-10.0) 

(-6.9) 

(-1.8) 

(-14.7) 

V2 

2.39 

54.4 

13.3 

18.3 

6.24 

19.4 

(-1.3) 

(-4.9) 

(-  9.4) 

(-7.3) 

(-1.7) 

(-13.7) 

V 

2.49 

51.1 

13.2 

18.9 

6.24 

20.7 

«r 

(-1.4) 

(-6.9) 

(-  9.1) 

(-7.0) 

(-1.8) 

(-14.5) 

VL 

2.92 

55.1 

13.1 

22.6 

6.85 

24.2 

(-1.0) 

(-1.2) 

9.0) 

(-4.8) 

(-1.1) 

(-11.7) 

VH 

2.74 

59.4 

17.9 

23.3 

7.64 

22.0 

(-1.1) 

(-1.4) 

(-  5.7) 

(-5.5) 

(-0.9) 

(-  9.6) 

VD 

3.42 

66.1 

22.4 

30.9 

8.92 

26.7 

(-  .5) 

(+  .5) 

(-  2.6) 

(-1.6) 

(+.08) 

(-  4.4) 

VJ 

5.56 

84.2 

37.1 

52.6 

11.36 

48.1 

(+0.6) 

(+16.8) 

(+  5.3) 

(+7.1) 

(+1.8) 

(+  9.9) 

^opt 

1.55 

1.20 

0.88 

2.40 

1.46 

1.53 

Unit 

1 

10000 

1000 

1000 

100 

100000 

*  Bias  given  inside  the  parenthesis 


5.4.  Behavior  of  the  Conf Idence  Intervals 

Only  the  results  on  populations  1  and  6  are  reported  in  Table  3. 
They  are  representative  of  a  bigger  study  in  Deng(1984),  which  is  well 
summarized  by  the  following  conclusions. 

(a)  Normality  of  the  t-statistic: 

(al)  The  behavior  of  the  t-statistic  is  similar  to  the  student  t- 
distribution:  the  bias  and  skewness  close  to  zero  and  standard  devia¬ 
tion  close  to  one. 

(a2)  The  t-statistic  associated  with  vQ  has  the  largest  variance 
while  that  associated  with  Vj  is  the  smallest. 


(b)  Unconditional  coverage  probability: 

(bl)  For  all  six  populations,  the  coverage  probability  is  lower  than 
the  nominal  level  1  -  a. 

(b2)  The  confidence  interval  associated  with  v^.  has  the  closest  cov¬ 
erage  probability  to  the  nominal  level  while  that  associated  with  vQ 
has  the  lowest  coverage  probability. 

<b3)  T^e  confidence  interval  associated  with  Vj  has  the  best  perfor¬ 
mance  among  all  estimators  considered.  The  superior  performance  of  Vj 
can  be  explained  in  part  by  the  large  values  of  E(  Vj). 


(b4 ) 

Among 

vQ,  vx  and  v2. 

v2  is 

the 

best 

and 

v0  the 

worst. 

(b5 ) 

Among 

vH,  vD  and  Vj, 

Vj  is 

the 

best 

and 

vH  the 

worst. 

may  partly  be  explained  by  the  results  in  Section  3  where  vH  was 
shown  to  be  stochastically  smaller  than  v^  and  v^  smaller  than  Vj. 


(c)  Conditional  coverage  probability: 

(cl)  We  can  clearly  see  the  excellent  performance  of  the  conditional 
coverage  probabilities  associated  with  Vj.  They  do  not  fluctuate  very 


much  as  x  varies. 


(c2)  Compared  with  the  ocher  estimators,  the  coverage  probabilities 
associated  with  v^,  v^,  Vj  are  pretty  stable  over  x,  whereas  those 
associated  with  Vg,  v^,  v  are  increasing  in  x.  For  example,  in 

g 

population  1,  the  actual  coverage  probability  of  the  95%  confidence 
interval  associated  with  Vg  in  the  first  group  is  as  low  as  73%  and 
in  the  last  group  as  high  as  99%. 

(c3)  Among  Vg,  v^,  Vj,  v2  has  the  most  stable  conditional  coverage 
probabilities . 

(c4)  Among  v^,  Vp,  vJf  Vj  has  bigger  coverage  probabilities  than 
that  of  vD  for  each  group;  and  vQ  bigger  than  v^.  This  again  can  be 
explained  by  our  asymptotic  results  in  Section  3. 

(c5)  For  "nearly"  balanced  samples  (  i.e.  x  close  to  X),  all  esti¬ 
mators  perform  similarly.  For  example,  for  each  population  the  5-th 
and  6-th  groups  have  similar  coverage  probabilities  for  all  estima- 


Table  3.  Coverage  probabilities  of  the  t-intervals  In  (5.2) 


and  descriptive  statistics  of  t  In  (5.1)  based  on  1000  samples 

Population  1 


99% 

95% 

90% 

Bla3 

Var. 

Skew. 

Kurt. 

fc0 

94.3 

88.5 

80.5 

-.1311 

1.9640 

- 

.0925 

4.7126 

95.1 

89.3 

82.5 

-.1223 

1.7470 

- 

.0593 

4.3104 

fc2 

96.4 

89.8 

84.2 

-.1151 

1.6067 

- 

.0203 

3.8002 

94.9 

89.0 

82.4 

-.1217 

1.8276 

- 

.0352 

4.4895 

94.7 

89.1 

82.0 

-.1232 

1.8255 

- 

.0760 

4.5755 

96.0 

90.1 

83.5 

-.1105 

1.6160 

- 

.0225 

4.1515 

96.4 

91.4 

85.2 

-.1077 

1.4987 

- 

.0581 

4.2578 

97.3 

92.7 

87.6 

-.1050 

1.3053 

- 

.1010 

4.3977 

Conditional 

95%  C.I 

.  coverage 

probability 

X 

fc0 

h 

t2  fcH 

fcD 

fcJ  1 

fcL 

76.0 

73 

81 

85  87 

88 

91 

80 

79 

88.4 

78 

81 

84  82 

84 

89 

81 

80 

96.5 

76 

77 

82  81 

82 

85 

77 

76 

102.6 

84 

85 

88  88 

89 

90 

84 

84 

109.2 

94 

94 

95  95 

95 

96 

94 

94 

115.4 

87 

86 

85  87 

91 

91 

87 

87 

121.9 

96 

95 

92  95 

96 

96 

96 

96 

128.2 

97 

96 

95  95 

97 

97 

97 

97 

137.2 

99 

97 

96  95 

96 

96 

97 

99 

157.1 

99 

98 

96  96 

96 

96 

97 

99 
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Population  6 


99% 

95% 

90% 

Bias 

Var. 

Skew. 

Kurt. 

fc0 

91. 8 

84.1 

77.4 

-.0793 

2.4208 

- 

.1271 

5.2355 

h 

94.2 

85.7 

79.4 

-.0784 

2.0163 

- 

.1136 

4.5610 

t2 

95.8 

87.9 

80.2 

-.0772 

1.7840 

- 

.0474 

4.0702 

*9 

93.8 

85.1 

79.1 

-.0753 
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Proof  of  Lemma  3.1.  Parts  (b)  and  (c)  follow  from  <a>  and  formulas 
(13)  and  (14)  of  Wu(1982a).  To  prove  (a),  from  formula  (7.31)  of 
Cochran ( 1977 )  and  formulas  (3.2) ,(3.5)  and  (3.6),  we  have 


Z  C(  y.  -  y)  -  b(  x. 
1=1 


—  7  n  —  1  2  n 

x)32  =  Z  (  e,  -  e>2  -  S2  Z  (  x.  - 

1*1  1  ni=l  1 


x)2 


~2 


=  Z  (  e,  -  e)  - 
1=1  1 


n 

,  +  0  (n  W,J)  =  Z  e,  “  +  0(1) 

«2  P  1  =  1  1  p 


( A3. 1 ) 


This  proves  part  (a). 


Proof  of  Lemma  3.2.  Note  that 


x  -  x  =  ^ -  x  =  x  -  X)  . 

r  N  -  n  1-f 


(A3. 2) 


From  (3.14)  and  (A3.2),  the  numerator  of  a i  in  (2.15)  is  equal  to 


=  1  -  Xj 


x)  +  xi  “  x)2  + 


^Cl  -  2p(  ■xi  -  x)  +  p2(  x±  -  x)23  +  0p(n*1,5)  .  (A3. 3) 


We  used  the  facts  p2  =  0p(n  1>  and  f  =  0(n  °'^)  in  deriving  (A3. 3). 


From  (2.15)  and  (A3. 3)  ,  we  obtain 


a<  =  (l-f)_1(l  -  p<  x<  -  x))2/(l  -  q(  xi  -  x)2)+  0D(n-1'5),  (A3'4) 


i  “  * '  **  "i  ~ .  ■»'  "i  ’  *p' 

which  easily  implies  (3.12).  Formula  (3.13)  follows  from  (3.12)  and 


(1  -  q(  x±  -  x)2)-1  =  1  +  q<  xA  -  x)2  +  0  (n'1,5)  . 


Proof  of  Lemma  3.3. 

From  w^  =  0p(n  ^  and  ^i 


1.  .  ■-  -  o  (n_1),  PA  in  vH  satisfies 
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r.2  +  0„(n-2 ) 


From  (2.14),  we  have 


i 7  n  a  7  T_f  n  ^  _2  «; 
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From  (3.14)  and  ( A3 . 2 ) , 
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which  gives  the  desired  result  since  p2  =  0p(n  1)  and  f  =  0(n  °'5). 


Proof  of  Lemma  3.5.  From  formula  (6.1)  of  Royall  and  Cumberland 
(1978,  p.357),  we  have 
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(A3. 7) 


x^2  is  the  second  sample  moment  of  x  and  k^  is  defined  in  (2.19) 


We  then  show  r 


=  0  (n~J).  From  g,  =  0  (n  )  and  k.  =  0 

n  p  ’1  p  IF 


(n  ) , 


the  second  term  inside  the  square  bracket  of  (A3. 6)  is  0p(n  >.  Its 
first  term 
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n  a  i  ^  a  n  a  i 
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Therefore  r  =  0  (N~2)  *  0  (f2  n-2)  =  0  (n-3).  The  proof  Is  com- 
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pleted  by  applying  1  +  =*  £  (1  -  p(  -  x))  and  (1  - 

kj)  =  (1  -  q(  xi  -  x)2>  to  (A3. 5). 

Proof  of  Lemma  2.6.  It  follows  easily  from  Lemma  3.2,  (3.11),  Lemma 
3.5  and 
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1  1  P 

Proof  of  Theorem  £.1.  Using  (3.3)  and  (3.7),  we  can  show  that 
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To  compute  the  expectation  of  (A4.1),  we  need  the  following  formulas: 
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which  and  (4.1)  imply  the  result. 


Proof  of  (4.3).  From 
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Taking  the  expectation,  we  get 
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which  together  with  Theorem  4.1(c)  gives  the  result. 

To  prove  (4.4),  we  need  the  following  formulas  and  Lemmas  A4.1 

2 

and  A4.2.  Formulas  (A4.9)-(A4.11)  find  the  leading  terms  of  p,  p  and 


q,  defined  in  Lemma  3.2, 
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where  vi  and  V  are  defined  in  (3.4). 


Lemma  A4. 1. 
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from  which  the  result  follows  easily. 
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which  implies  Lemma  A4.2,  by  using  (3.2)  and  (3.5). 

Proof  (4.4).  Using  (A4.9)  and  Lemma  A4.2,  the  second  term  of  v 
in  (3.13)  is 
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The  third  term  of  vQ  in  (3.13)  can  be  simplified  by  using 


(A4. 10) , < A4. 11 )  and  Lemma  A4.1, 
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